-
Notifications
You must be signed in to change notification settings - Fork 256
XsyevBatched! interface accepting 3D StridedCuArray #2951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
499039f to
258eefe
Compare
Signed-off-by: Steven Hahn <hahnse@ornl.gov>
Signed-off-by: Steven Hahn <hahnse@ornl.gov>
258eefe to
e3dbe9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
| Benchmark suite | Current: e3dbe9b | Previous: 2e983fe | Ratio |
|---|---|---|---|
latency/precompile |
56489805251.5 ns |
56427085830.5 ns |
1.00 |
latency/ttfp |
8293952143.5 ns |
8362501410 ns |
0.99 |
latency/import |
4494056576 ns |
4521778039 ns |
0.99 |
integration/volumerhs |
9623518.5 ns |
9624952.5 ns |
1.00 |
integration/byval/slices=1 |
147182 ns |
146870 ns |
1.00 |
integration/byval/slices=3 |
425892.5 ns |
425790 ns |
1.00 |
integration/byval/reference |
145072.5 ns |
144866 ns |
1.00 |
integration/byval/slices=2 |
286243 ns |
286021 ns |
1.00 |
integration/cudadevrt |
103470 ns |
103323 ns |
1.00 |
kernel/indexing |
14044 ns |
14090 ns |
1.00 |
kernel/indexing_checked |
14727 ns |
14977.5 ns |
0.98 |
kernel/occupancy |
669.2784810126582 ns |
670.5886075949367 ns |
1.00 |
kernel/launch |
2196.8888888888887 ns |
2115.8 ns |
1.04 |
kernel/rand |
14816 ns |
16842 ns |
0.88 |
array/reverse/1d |
19812.5 ns |
19633 ns |
1.01 |
array/reverse/2dL_inplace |
66690 ns |
66698 ns |
1.00 |
array/reverse/1dL |
69959.5 ns |
69881 ns |
1.00 |
array/reverse/2d |
21801 ns |
21367 ns |
1.02 |
array/reverse/1d_inplace |
9785 ns |
9601 ns |
1.02 |
array/reverse/2d_inplace |
13177 ns |
13220 ns |
1.00 |
array/reverse/2dL |
73804 ns |
73483 ns |
1.00 |
array/reverse/1dL_inplace |
66861 ns |
66751 ns |
1.00 |
array/copy |
20434 ns |
20712 ns |
0.99 |
array/iteration/findall/int |
156396 ns |
156846 ns |
1.00 |
array/iteration/findall/bool |
139124 ns |
139935.5 ns |
0.99 |
array/iteration/findfirst/int |
161010.5 ns |
160606 ns |
1.00 |
array/iteration/findfirst/bool |
162133 ns |
161405 ns |
1.00 |
array/iteration/scalar |
72390.5 ns |
72218 ns |
1.00 |
array/iteration/logical |
215485 ns |
215761.5 ns |
1.00 |
array/iteration/findmin/1d |
49862 ns |
49669 ns |
1.00 |
array/iteration/findmin/2d |
96114.5 ns |
96275.5 ns |
1.00 |
array/reductions/reduce/Int64/1d |
42907 ns |
43492 ns |
0.99 |
array/reductions/reduce/Int64/dims=1 |
44354 ns |
44664.5 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
61379 ns |
61641 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
88845 ns |
88640 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
87880 ns |
87635.5 ns |
1.00 |
array/reductions/reduce/Float32/1d |
36503 ns |
36681 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
47658 ns |
48806 ns |
0.98 |
array/reductions/reduce/Float32/dims=2 |
59568 ns |
59459 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
52285 ns |
52065 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
72078 ns |
71664 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
43149 ns |
43256 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
44780 ns |
44863 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
61149 ns |
61500 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1L |
88670 ns |
88638 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
88056.5 ns |
87897.5 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
36578 ns |
36277.5 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1 |
41404.5 ns |
41259 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
59539.5 ns |
59440 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52439 ns |
52331.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
71900 ns |
71656.5 ns |
1.00 |
array/broadcast |
19823 ns |
19817 ns |
1.00 |
array/copyto!/gpu_to_gpu |
11275 ns |
11436 ns |
0.99 |
array/copyto!/cpu_to_gpu |
213575 ns |
215179 ns |
0.99 |
array/copyto!/gpu_to_cpu |
282375 ns |
282618 ns |
1.00 |
array/accumulate/Int64/1d |
124207 ns |
124273 ns |
1.00 |
array/accumulate/Int64/dims=1 |
83056 ns |
83182 ns |
1.00 |
array/accumulate/Int64/dims=2 |
157765.5 ns |
157485 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1709258.5 ns |
1709450 ns |
1.00 |
array/accumulate/Int64/dims=2L |
966565 ns |
966304 ns |
1.00 |
array/accumulate/Float32/1d |
108669 ns |
108932 ns |
1.00 |
array/accumulate/Float32/dims=1 |
80172 ns |
80065 ns |
1.00 |
array/accumulate/Float32/dims=2 |
146909 ns |
146929 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1618657 ns |
1618534.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
697757 ns |
697506 ns |
1.00 |
array/construct |
1284.5 ns |
1270.6 ns |
1.01 |
array/random/randn/Float32 |
44191 ns |
47947 ns |
0.92 |
array/random/randn!/Float32 |
24824 ns |
24918 ns |
1.00 |
array/random/rand!/Int64 |
27206 ns |
27167 ns |
1.00 |
array/random/rand!/Float32 |
8765 ns |
8884.333333333334 ns |
0.99 |
array/random/rand/Int64 |
29627 ns |
37695.5 ns |
0.79 |
array/random/rand/Float32 |
12851 ns |
12943 ns |
0.99 |
array/permutedims/4d |
55677 ns |
59797.5 ns |
0.93 |
array/permutedims/2d |
53547 ns |
53660 ns |
1.00 |
array/permutedims/3d |
54554 ns |
54666 ns |
1.00 |
array/sorting/1d |
2757157 ns |
2757791.5 ns |
1.00 |
array/sorting/by |
3343841 ns |
3344326 ns |
1.00 |
array/sorting/2d |
1080788 ns |
1080588 ns |
1.00 |
cuda/synchronization/stream/auto |
1033.0833333333333 ns |
1040 ns |
0.99 |
cuda/synchronization/stream/nonblocking |
7428.8 ns |
6879.299999999999 ns |
1.08 |
cuda/synchronization/stream/blocking |
826.0961538461538 ns |
805.0612244897959 ns |
1.03 |
cuda/synchronization/context/auto |
1180.9 ns |
1175.2 ns |
1.00 |
cuda/synchronization/context/nonblocking |
7994.2 ns |
7439.7 ns |
1.07 |
cuda/synchronization/context/blocking |
904.6071428571429 ns |
896.560975609756 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
|
Test fail is due to #2971, unrelated to this PR |
kshyatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM although updating the error messages would be nice
Signed-off-by: Steven Hahn <hahnse@ornl.gov>
74b01e4 to
09d9025
Compare
I noticed that the batched eigensolver function CUSOLVER.heevjBatched! accepts a 3-dimensional StridedCuArray while the similar function CUSOLVER.XsyevBatched! accepts a 2-dimensional StridedCuMatrix. I think having the same interface for both would be desirable.